15 research outputs found

    Wide・Deepモデルを用いた機械学習を高速化するためのアルゴリズム

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第23310号情博第746号新制||情||127(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 鹿島 久嗣, 教授 田中 利幸, 教授 山下 信雄学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks

    Full text link
    Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relation between saturation of the gate function and efficiency of the training. We prove that the gradient vanishing of the gate function can be mitigated by accelerating the convergence of the saturating function, i.e., making the output of the function converge to 0 or 1 faster. Based on the analysis results, we propose a gate function called fast gate that has a doubly exponential convergence rate with respect to inputs by simple function composition. We empirically show that our method outperforms previous methods in accuracy and computational efficiency on benchmark tasks involving extremely long time scales.Comment: 9 pages of main texts with 4 pages appendices, 12 figure

    Smoothness Analysis of Adversarial Training

    Full text link
    Deep neural networks are vulnerable to adversarial attacks. Recent studies about adversarial robustness focus on the loss landscape in the parameter space since it is related to optimization and generalization performance. These studies conclude that the difficulty of adversarial training is caused by the non-smoothness of the loss function: i.e., its gradient is not Lipschitz continuous. However, this analysis ignores the dependence of adversarial attacks on model parameters. Since adversarial attacks are optimized for models, they should depend on the parameters. Considering this dependence, we analyze the smoothness of the loss function of adversarial training using the optimal attacks for the model parameter in more detail. We reveal that the constraint of adversarial attacks is one cause of the non-smoothness and that the smoothness depends on the types of the constraints. Specifically, the LL_\infty constraint can cause non-smoothness more than the L2L_2 constraint. Moreover, our analysis implies that if we flatten the loss function with respect to input data, the Lipschitz constant of the gradient of adversarial loss tends to increase. To address the non-smoothness, we show that EntropySGD smoothens the non-smooth loss and improves the performance of adversarial training.Comment: 22 pages, 7 figures. In V3, we add the results of EntropySGD for adversarial trainin

    Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness

    Full text link
    Adversarial training is a promising method to improve the robustness against adversarial attacks. To enhance its performance, recent methods impose high weights on the cross-entropy loss for important data points near the decision boundary. However, these importance-aware methods are vulnerable to sophisticated attacks, e.g., Auto-Attack. In this paper, we experimentally investigate the cause of their vulnerability via margins between logits for the true label and the other labels because they should be large enough to prevent the largest logit from being flipped by the attacks. Our experiments reveal that the histogram of the logit margins of na\"ive adversarial training has two peaks. Thus, the levels of difficulty in increasing logit margins are roughly divided into two: difficult samples (small logit margins) and easy samples (large logit margins). On the other hand, only one peak near zero appears in the histogram of importance-aware methods, i.e., they reduce the logit margins of easy samples. To increase logit margins of difficult samples without reducing those of easy samples, we propose switching one-versus-the-rest loss (SOVR), which switches from cross-entropy to one-versus-the-rest loss (OVR) for difficult samples. We derive trajectories of logit margins for a simple problem and prove that OVR increases logit margins two times larger than the weighted cross-entropy loss. Thus, SOVR increases logit margins of difficult samples, unlike existing methods. We experimentally show that SOVR achieves better robustness against Auto-Attack than importance-aware methods.Comment: 25 pages, 18 figure

    Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks

    Full text link
    We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box adversarial attack: Single Fourier attack. To reduce structural sensitivity, we can use regularization of convolution filter weights since the sensitivity of linear transform can be assessed by the norm of the weights. However, standard regularization methods can prevent minimization of the loss function because they impose a tight constraint for obtaining high robustness. To solve this problem, Absum imposes a loose constraint; it penalizes the absolute values of the summation of the parameters in the convolution layers. Absum can improve robustness against single Fourier attack while being as simple and efficient as standard regularization methods (e.g., weight decay and L1 regularization). Our experiments demonstrate that Absum improves robustness against single Fourier attack more than standard regularization methods. Furthermore, we reveal that robust CNNs with Absum are more robust against transferred attacks due to decreasing the common sensitivity and against high-frequency noise than standard regularization methods. We also reveal that Absum can improve robustness against gradient-based attacks (projected gradient descent) when used with adversarial training.Comment: 16 pages, 39 figure

    Fast Lasso Algorithm via Selective Coordinate Descent

    No full text
    For the AI community, the lasso proposed by Tibshirani is an important regression approach in finding explanatory predictors in high dimensional data. The coordinate descent algorithm is a standard approach to solve the lasso which iteratively updates weights of predictors in a round-robin style until convergence. However, it has high computation cost. This paper proposes Sling, a fast approach to the lasso. It achieves high efficiency by skipping unnecessary updates for the predictors whose weight is zero in the iterations. Sling can obtain high prediction accuracy with fewer predictors than the standard approach. Experiments show that Sling can enhance the efficiency and the effectiveness of the lasso

    Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks

    Full text link
    Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs.Comment: Code will be available at https://github.com/dchiji-ntt/meta-ticke
    corecore